Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models

نویسندگان

چکیده

The advent of transformer-based models such as BERT has led to the rise neural ranking models. These have improved effectiveness retrieval systems well beyond that lexical term matching BM25. While monolingual tasks benefited from large-scale training collections MS MARCO and advances in architectures, cross-language fallen behind these advancements. This paper introduces ColBERT-X, a generalization ColBERT multi-representation dense model uses XLM-RoBERTa (XLM-R) encoder support information (CLIR). ColBERT-X can be trained two ways. In zero-shot training, system is on English collection, relying XLM-R for mappings. translate-train, queries coupled with machine translations associated passages. Results ad hoc document several languages demonstrate substantial statistically significant improvements over traditional CLIR baselines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Alternative Approaches for Cross-Language Text Retrieval

The explosive growth of the Internet and other sources of networked information have made automatic me diation of access to networked information sources an increasingly important problem Much of this informa tion is expressed as electronic text and it is becoming practical to automatically convert some printed docu ments and recorded speech to electronic text as well Thus automated systems cap...

متن کامل

Different approaches to Cross Language Information Retrieval

This paper describes two experiments in the domain of Cross Language Information Retrieval. Our basic approach is to translate queries word by word using machine readable dictionaries. The first experiment compared different strategies to deal with word sense ambiguity: i) keeping all translations and integrate translation probabilities in the model, ii) a single translation is selected on the ...

متن کامل

Assessing Wikipedia-Based Cross-Language Retrieval Models

mir durch ihre Hilfe bei den maschinellen¨Ubersetzungen viel Zeit gespart.

متن کامل

Cross-language Transfer of Multilingual Phoneme Models

We present a method to use speech data from multiple languages to enhance the performance of a flexible vocabulary command word recognizer which is trained using a small amount of speech data of the target language. We develop data-driven approaches for identification of multilingual phoneme units and mapping of these units to the target language phonemes, and evaluate them against the knowledg...

متن کامل

An automated linguistic knowledge-based cross-language transfer method for building acoustic models for a language without native training data

In this paper we describe an automated, linguistic knowledgebased method for building acoustic models for a target language for which there is no native training data. The method assumes availability of well-trained acoustic models for a number of existing source languages. It employs statistically derived phonetic and phonological distance metrics, particularly a combined phonetic-phonological...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-99736-6_26